In this post I will use the NBA API to access shot chart data and use it to make some cool plots based on the shot zone infromation which is available in the raw data.

I wrote a package in order to access the NBA api. It can be see on my github page (https://github.com/eyalshafran/NBAapi). This NBA package also includes some plotting features as I will show in this post. This package is an on going project which will be updated as I keep working on this blog.

In [1]:
import NBAapi as nba
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
import sys
from scipy import misc
%matplotlib inline 

First let's access the data and preview it:

In [2]:
shotchart,leagueavergae = nba.shotchart.shotchartdetail(season='2016-17') # get shot chart data from NBA.stats
shotchart.head()
Out[2]:
GRID_TYPE GAME_ID GAME_EVENT_ID PLAYER_ID PLAYER_NAME TEAM_ID TEAM_NAME PERIOD MINUTES_REMAINING SECONDS_REMAINING ... SHOT_ZONE_AREA SHOT_ZONE_RANGE SHOT_DISTANCE LOC_X LOC_Y SHOT_ATTEMPTED_FLAG SHOT_MADE_FLAG GAME_DATE HTM VTM
0 Shot Chart Detail 0021600001 2 201565 Derrick Rose 1610612752 New York Knicks 1 11 40 ... Center(C) Less Than 8 ft. 0 4 8 1 1 20161025 CLE NYK
1 Shot Chart Detail 0021600001 3 201567 Kevin Love 1610612739 Cleveland Cavaliers 1 11 26 ... Center(C) Less Than 8 ft. 3 -11 36 1 0 20161025 CLE NYK
2 Shot Chart Detail 0021600001 5 2546 Carmelo Anthony 1610612752 New York Knicks 1 11 16 ... Right Side Center(RC) 16-24 ft. 19 148 129 1 0 20161025 CLE NYK
3 Shot Chart Detail 0021600001 7 204001 Kristaps Porzingis 1610612752 New York Knicks 1 11 15 ... Center(C) Less Than 8 ft. 2 24 -1 1 1 20161025 CLE NYK
4 Shot Chart Detail 0021600001 8 2544 LeBron James 1610612739 Cleveland Cavaliers 1 10 59 ... Left Side(L) 8-16 ft. 11 -79 80 1 1 20161025 CLE NYK

5 rows × 24 columns

Extracting zone based statistics for each player

Each player has a unique player ID and also a name (which might not be unique). It is possible to just work with the player ID but I find that it is less informative when looking at the data and therefore I'm creating a new column (called PLAYER) which incorporates both the player name and ID.

I'm going to create a list of tuples with zone names which will be used later.

The shot zone can be found using the combination of the 'SHOT_ZONE_RANGE' and 'SHOT_ZONE_AREA' columns. I will also use the 'SHOT_MADE_FLAG' columns to see whether the shot was made or not. I'm going to use the groupby method in order to get a dataframe with zone based infromation for each player. The aggergator size will show us how many times a player shot from each zone and whether they made it or not:

In [3]:
shotchart['PLAYER'] = zip(shotchart['PLAYER_NAME'],shotchart['PLAYER_ID'])
zones_list = [(u'Less Than 8 ft.', u'Center(C)'),
              (u'8-16 ft.', u'Center(C)'),
              (u'8-16 ft.', u'Left Side(L)'),
              (u'8-16 ft.', u'Right Side(R)'),
              (u'16-24 ft.', u'Center(C)'),
              (u'16-24 ft.', u'Left Side Center(LC)'),
              (u'16-24 ft.', u'Left Side(L)'),
              (u'16-24 ft.', u'Right Side Center(RC)'),
              (u'16-24 ft.', u'Right Side(R)'),
              (u'24+ ft.', u'Center(C)'),
              (u'24+ ft.', u'Left Side Center(LC)'),
              (u'24+ ft.', u'Left Side(L)'),
              (u'24+ ft.', u'Right Side Center(RC)'),
              (u'24+ ft.', u'Right Side(R)'),
              (u'Back Court Shot', u'Back Court(BC)')]
# Create dataframe with PLAYER as index and the rest as columns
zones = shotchart.groupby(['SHOT_ZONE_RANGE','SHOT_ZONE_AREA','SHOT_MADE_FLAG','PLAYER']).size().unstack(fill_value=0).T
zones.head()
Out[3]:
SHOT_ZONE_RANGE 16-24 ft. ... 8-16 ft. Back Court Shot Less Than 8 ft.
SHOT_ZONE_AREA Center(C) Left Side Center(LC) Left Side(L) Right Side Center(RC) Right Side(R) ... Center(C) Left Side(L) Right Side(R) Back Court(BC) Center(C)
SHOT_MADE_FLAG 0 1 0 1 0 1 0 1 0 1 ... 0 1 0 1 0 1 0 1 0 1
PLAYER
(AJ Hammons, 1627773) 0 1 2 1 0 0 1 0 0 1 ... 1 0 1 0 2 0 0 0 5 4
(Aaron Brooks, 201166) 5 0 3 5 7 5 6 0 2 2 ... 5 4 6 7 7 10 7 0 57 39
(Aaron Gordon, 203932) 10 7 15 6 12 5 23 9 14 7 ... 20 14 31 17 18 14 2 0 128 219
(Aaron Harrison, 1626151) 0 0 1 0 1 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
(Adreian Payne, 203940) 1 2 3 1 0 0 1 2 0 0 ... 0 0 1 0 0 3 0 0 13 12

5 rows × 30 columns

The shot chart data does not say how many games each player played. We will use the player biostats data to get that infromation:

In [4]:
players = nba.player.biostats(season='2016-17')
players['PLAYER'] = zip(players['PLAYER_NAME'],players['PLAYER_ID'])
players.set_index('PLAYER',inplace=True)
players.head()
Out[4]:
PLAYER_ID PLAYER_NAME TEAM_ID TEAM_ABBREVIATION AGE PLAYER_HEIGHT PLAYER_HEIGHT_INCHES PLAYER_WEIGHT COLLEGE COUNTRY ... GP PTS REB AST NET_RATING OREB_PCT DREB_PCT USG_PCT TS_PCT AST_PCT
PLAYER
(AJ Hammons, 1627773) 1627773 AJ Hammons 1610612742 DAL 24.0 7-0 84 260 Purdue USA ... 18 1.8 1.5 0.1 9.6 0.059 0.239 0.162 0.496 0.029
(Aaron Brooks, 201166) 201166 Aaron Brooks 1610612754 IND 32.0 6-0 72 161 Oregon USA ... 62 5.1 1.1 1.9 -3.9 0.023 0.067 0.196 0.504 0.220
(Aaron Gordon, 203932) 203932 Aaron Gordon 1610612753 ORL 21.0 6-9 81 220 Arizona USA ... 77 12.5 5.0 1.9 -2.6 0.054 0.139 0.200 0.524 0.098
(Aaron Harrison, 1626151) 1626151 Aaron Harrison 1610612766 CHA 22.0 6-6 78 210 Kentucky USA ... 5 0.2 0.6 0.6 -18.6 0.000 0.200 0.142 0.102 0.375
(Adreian Payne, 203940) 203940 Adreian Payne 1610612750 MIN 26.0 6-10 82 237 Michigan State USA ... 17 3.7 1.9 0.4 -3.6 0.071 0.204 0.234 0.505 0.083

5 rows × 23 columns

We will need to merge the GP column from the players dataframe with the zones dataframe that we created earlier. Since both dataframes have the same index we can use pandas join

In [5]:
GP = players.loc[:,['GP']] # create DataFrame with single GP column
GP.columns = pd.MultiIndex.from_product([GP.columns,[''],['']]) # change column to multiindex before join (prevents join warning)
zones_with_GP = zones.join(GP) # only inclued game played from players
zones_with_GP.columns = pd.MultiIndex.from_tuples(zones_with_GP.columns.tolist(), 
                                                  names=['SHOT_ZONE_RANGE','SHOT_ZONE_AREA','MADE'])
zones_with_GP = zones_with_GP.sortlevel(0,axis=1) # sort columns for better performance (+ avoid warning) 
zones_with_GP.head()
Out[5]:
SHOT_ZONE_RANGE 16-24 ft. ... 8-16 ft. Back Court Shot GP Less Than 8 ft.
SHOT_ZONE_AREA Center(C) Left Side Center(LC) Left Side(L) Right Side Center(RC) Right Side(R) ... Center(C) Left Side(L) Right Side(R) Back Court(BC) Center(C)
MADE 0 1 0 1 0 1 0 1 0 1 ... 1 0 1 0 1 0 1 0 1
PLAYER
(AJ Hammons, 1627773) 0 1 2 1 0 0 1 0 0 1 ... 0 1 0 2 0 0 0 18 5 4
(Aaron Brooks, 201166) 5 0 3 5 7 5 6 0 2 2 ... 4 6 7 7 10 7 0 62 57 39
(Aaron Gordon, 203932) 10 7 15 6 12 5 23 9 14 7 ... 14 31 17 18 14 2 0 77 128 219
(Aaron Harrison, 1626151) 0 0 1 0 1 0 0 0 0 0 ... 0 0 0 0 0 0 0 5 0 0
(Adreian Payne, 203940) 1 2 3 1 0 0 1 2 0 0 ... 0 1 0 0 3 0 0 17 13 12

5 rows × 31 columns

Let's do some plotting!

Which players takes the most shots per zone?

I already included some plotting tools in the package. For the court plot I used the following blog http://savvastjortjoglou.com/nba-shot-sharts.html. I made some changes to the court function (biggest change is working in feet instead of feet*10 which the shot chart location comes in).

I also have a plt.text_in_zone function which accepts a text and the zone tuple and writes the text in the specified zone.

We need to sum over the 0s (missed shot) and 1s (made shots) to get the total shots and divide by the number of game played.

In [6]:
path = os.path.dirname(nba.__file__) # get path of the nba module
floor = misc.imread(path+'\\data\\court.jpg') # load floor template
plt.figure(figsize=(15,12.5),facecolor='white') # set up figure
ax = nba.plot.court(lw=4,outer_lines=False) # plot NBA court - don't include the outer lines
ax.axis('off')
nba.plot.zones(lw=2,color='white',linewidth=3)
eligible = zones_with_GP.loc[:,'GP'].values > 10 # only include players which player more than 10 games
# we are going to use the zone_list to plot information in each zone
for zone in zones_list:
    # calculate shots per game for specific zone and sort from highest to lowest
    shots_PG = (zones_with_GP.loc[eligible,zone].sum(axis=1)/zones_with_GP.loc[eligible,'GP']).sort_values(0,ascending=False)
    name = [] # will be used to store the text we want to print
    # run a loop to find top 3 players 
    for j in range(3):
        # create text
        name.append(shots_PG.index[j][0].split(' ')[0][0]+'. ' + shots_PG.index[j][0].split(' ')[1]+':%0.1f' %shots_PG.values[j])
    nba.plot.text_in_zone('\n'.join(name),zone,color='black',backgroundcolor = 'white',alpha = 1)
plt.title('Most Shots by Zone',fontsize=16)
plt.imshow(floor,extent=[-30,30,-7,43]) # plot floor
Out[6]:

Which players have the highest FG% at every zone?

In [7]:
plt.figure(figsize=(15,12.5),facecolor='white')
ax = nba.plot.court(lw=4,outer_lines=False)
ax.axis('off')
nba.plot.zones(color='gray')
eligible = zones_with_GP.loc[:,'GP'].values > 10 
for zone in zones_list:
    # create new dataframe with total shot, shots per game and FG%
    df = pd.concat([zones_with_GP.loc[eligible,zone].sum(axis=1),
                    zones_with_GP.loc[eligible,zone].sum(axis=1)/zones_with_GP.loc[eligible,'GP'],
                    100.0*zones_with_GP.loc[eligible,(zone[0],zone[1],1)]/zones_with_GP.loc[eligible,zone].sum(axis=1)],axis=1)
    df.columns = ['SHOTS','SHOTS_PG','FGP']
    # only include players that have a total of more than 10 shots or are in the top 100 in shots taken (from that zone)
    top100 = df.loc[:,'SHOTS_PG'].sort_values(0,ascending=False)[100]
    if zone != (u'Back Court Shot', u'Back Court(BC)'):
        mask = (df.loc[:,'SHOTS_PG'] >= top100) & (df.loc[:,'SHOTS']>=10)
    else:
        mask = (df.loc[:,'SHOTS']>=2)    
    # sort by FG%
    perc_leaders = df.iloc[mask.values,:].sort_values('FGP',ascending=False)
    name = []
    for j in range(3):
        name.append(perc_leaders.index[j][0].split(' ')[0][0]+'. ' + perc_leaders.index[j][0].split(' ')[1]+': %0.1f (%d)' %(perc_leaders.ix[j,'FGP'],perc_leaders.ix[j,'SHOTS']))
    nba.plot.text_in_zone('\n'.join(name),zone,color='black',backgroundcolor = 'white',alpha = 1)
plt.title('Highest Field Goal % by Zone',fontsize=16)
plt.text(-15,-7,'Player: FG % \n (total shots)',horizontalalignment='center')
Out[7]:

I'm going to run the same analysis for the league average and therefore run the groupby without the PLAYER column. I also added another row calculating the FG%.

In [8]:
leagueaverage = shotchart.groupby(['SHOT_ZONE_RANGE','SHOT_ZONE_AREA','SHOT_MADE_FLAG']).size().unstack(fill_value=0).T
leagueaverage = pd.concat([leagueaverage,pd.DataFrame(leagueaverage.loc[1,:]/leagueaverage.sum(),columns=['FGP']).T])
np.round(leagueaverage,2) # round to make display nicer
Out[8]:
SHOT_ZONE_RANGE 16-24 ft. 24+ ft. 8-16 ft. Back Court Shot Less Than 8 ft.
SHOT_ZONE_AREA Center(C) Left Side Center(LC) Left Side(L) Right Side Center(RC) Right Side(R) Center(C) Left Side Center(LC) Left Side(L) Right Side Center(RC) Right Side(R) Center(C) Left Side(L) Right Side(R) Back Court(BC) Center(C)
0 3721.0 3870.00 2560.00 4355.0 2609.00 8148.00 11533.00 4712.00 11304.00 4459.00 5082.00 5382.0 5241.0 528.00 35594.00
1 2487.0 2692.00 1792.00 2870.0 1699.00 4441.00 6238.00 2979.00 6169.00 2843.00 3887.00 3624.0 3447.0 14.00 46690.00
FGP 0.4 0.41 0.41 0.4 0.39 0.35 0.35 0.39 0.35 0.39 0.43 0.4 0.4 0.03 0.57

I'm going to plot the FG% and the distribution of shots from each zone for the entire league

In [9]:
plt.figure(figsize=(15,12.5),facecolor='white')
ax = nba.plot.court(lw=4,outer_lines=False)
ax.axis('off')
nba.plot.zones(color='gray')
total_shots = leagueaverage.loc[0,:].sum()+leagueaverage.loc[1,:].sum()
for zone in zones_list:
    name = 'FG%% - %0.1f \nDST - %0.1f' %(100*leagueaverage.loc['FGP',zone],100*(leagueaverage.loc[0,zone]+leagueaverage.loc[1,zone])/total_shots)
    nba.plot.text_in_zone(name,zone,color='black',backgroundcolor = 'white',alpha = 1,fontsize=16)
plt.title('Shooting by Zone (League Average)',fontsize=16)
Out[9]:

Kevin Durant

I'm going to do a similar analysis as the league average but for a specific player. I choose Kevin Durant but any player would work. The mask can also be done for a team instead of a player (which I will show later)

In [10]:
durant = shotchart.loc[shotchart['PLAYER_NAME']=='Kevin Durant',:] # create a mask and only include Kevin Durant shots
In [11]:
durant_by_zone = durant.groupby(['SHOT_ZONE_RANGE','SHOT_ZONE_AREA','SHOT_MADE_FLAG']).size().unstack(fill_value=0).T
durant_by_zone= pd.concat([durant_by_zone,pd.DataFrame(durant_by_zone.loc[1,:]/durant_by_zone.sum(),columns=['FGP']).T])
np.round(durant_by_zone,2)
Out[11]:
SHOT_ZONE_RANGE 16-24 ft. 24+ ft. 8-16 ft. Back Court Shot Less Than 8 ft.
SHOT_ZONE_AREA Center(C) Left Side Center(LC) Left Side(L) Right Side Center(RC) Right Side(R) Center(C) Left Side Center(LC) Left Side(L) Right Side Center(RC) Right Side(R) Center(C) Left Side(L) Right Side(R) Back Court(BC) Center(C)
0 13.0 15.00 6.00 27.00 6.00 55.0 39.00 6.0 66.00 16.00 42.00 31.00 37.00 2.0 94.00
1 13.0 17.00 12.00 24.00 13.00 24.0 37.00 9.0 34.00 8.00 44.00 19.00 23.00 0.0 251.00
FGP 0.5 0.53 0.67 0.47 0.68 0.3 0.49 0.6 0.34 0.33 0.51 0.38 0.38 0.0 0.73
In [12]:
plt.figure(figsize=(15,12.5),facecolor='white')
ax = nba.plot.court(lw=4,outer_lines=False)
ax.axis('off')
nba.plot.zones(color='gray')
for zone in zones_list:
    name = ['%0.2f%% (%d)' %(100.0*durant_by_zone.loc['FGP',zone],durant_by_zone.loc[0,zone]+durant_by_zone.loc[1,zone]),
            'LA: %0.2f%%' %(100.0*leagueaverage.loc['FGP',zone])]
    nba.plot.text_in_zone('\n'.join(name),zone,color='black',backgroundcolor = 'white',alpha = 1,fontsize=14)
plt.title('Durant vs. League',fontsize=16)
plt.text(-15,-7,'FG % (total shots) \n League Average',horizontalalignment='center')
Out[12]:

I'm going to do one exmple with teams instead of players. In order to get the team stats per zone we need too do the same groupby operation as we did for players but this time we will do it with the 'TEAM_NAME' column:

In [13]:
team_by_zone = shotchart.groupby(['SHOT_ZONE_RANGE','SHOT_ZONE_AREA','SHOT_MADE_FLAG','TEAM_NAME']).size().unstack(fill_value=0).T
team_by_zone
Out[13]:
SHOT_ZONE_RANGE 16-24 ft. ... 8-16 ft. Back Court Shot Less Than 8 ft.
SHOT_ZONE_AREA Center(C) Left Side Center(LC) Left Side(L) Right Side Center(RC) Right Side(R) ... Center(C) Left Side(L) Right Side(R) Back Court(BC) Center(C)
SHOT_MADE_FLAG 0 1 0 1 0 1 0 1 0 1 ... 0 1 0 1 0 1 0 1 0 1
TEAM_NAME
Atlanta Hawks 131 91 122 89 83 58 136 105 99 89 ... 173 123 151 93 182 96 16 0 1217 1533
Boston Celtics 121 71 83 67 58 33 121 93 62 34 ... 111 105 102 79 163 89 24 0 1161 1528
Brooklyn Nets 60 47 48 31 38 21 88 47 64 30 ... 153 124 137 71 158 110 21 1 1333 1667
Charlotte Hornets 136 80 108 86 87 57 181 94 106 69 ... 157 120 169 111 200 135 16 1 1138 1434
Chicago Bulls 197 127 132 82 123 72 170 109 133 75 ... 179 116 222 126 260 173 11 0 1245 1574
Cleveland Cavaliers 68 42 103 64 77 72 104 79 77 65 ... 91 74 201 128 165 124 10 0 1003 1456
Dallas Mavericks 153 120 170 99 104 82 185 108 127 58 ... 163 156 180 135 177 130 18 0 827 1100
Denver Nuggets 127 68 95 61 75 47 93 89 68 36 ... 152 105 167 90 112 80 21 0 1349 1817
Detroit Pistons 131 94 145 112 99 90 128 70 123 82 ... 215 175 263 231 251 167 29 1 1224 1492
Golden State Warriors 99 65 104 92 63 75 158 119 53 59 ... 200 179 135 90 176 116 31 1 967 1657
Houston Rockets 55 37 36 25 33 22 46 34 41 16 ... 101 60 82 46 87 53 16 0 1180 1734
Indiana Pacers 189 148 154 116 131 81 189 127 106 73 ... 190 133 191 136 164 125 22 0 1142 1460
LA Clippers 202 137 162 141 67 54 192 158 103 77 ... 148 115 116 88 115 73 16 3 992 1453
Los Angeles Lakers 98 75 140 96 73 38 125 67 53 42 ... 211 153 190 100 202 114 17 1 1342 1672
Memphis Grizzlies 121 53 137 109 89 65 131 96 110 41 ... 176 114 199 122 115 73 34 0 1304 1469
Miami Heat 86 64 165 96 85 55 122 91 89 41 ... 201 154 193 121 163 99 16 0 1203 1548
Milwaukee Bucks 96 60 98 72 99 80 78 29 55 34 ... 105 57 184 94 145 113 12 0 1362 1844
Minnesota Timberwolves 141 96 184 121 118 94 192 111 90 75 ... 131 89 212 128 175 122 11 1 1200 1659
New Orleans Pelicans 122 73 163 96 120 62 147 100 70 45 ... 136 117 197 138 161 110 13 0 1255 1558
New York Knicks 128 93 159 124 124 85 175 114 117 91 ... 145 134 249 196 243 165 22 0 1270 1446
Oklahoma City Thunder 96 66 113 75 73 35 126 73 50 31 ... 173 147 190 102 188 102 17 2 1350 1788
Orlando Magic 118 77 166 93 80 57 186 111 110 78 ... 208 137 205 177 220 133 25 0 1169 1495
Philadelphia 76ers 106 52 80 44 57 33 134 59 70 47 ... 156 101 151 95 196 110 7 0 1264 1636
Phoenix Suns 147 89 156 105 105 51 208 135 106 52 ... 242 179 195 149 211 134 20 1 1280 1657
Portland Trail Blazers 122 86 159 122 83 52 137 104 79 58 ... 206 164 134 112 127 100 15 0 1248 1516
Sacramento Kings 142 96 122 94 81 53 163 95 84 54 ... 220 177 133 83 161 73 19 0 1184 1506
San Antonio Spurs 147 122 162 128 106 96 179 118 105 69 ... 157 102 257 178 213 137 13 0 1007 1401
Toronto Raptors 131 98 146 94 85 68 155 97 90 59 ... 216 179 211 159 160 119 10 1 1158 1525
Utah Jazz 102 63 94 60 77 31 127 67 74 47 ... 189 150 144 100 178 139 13 1 1055 1472
Washington Wizards 149 97 164 98 67 73 179 171 95 72 ... 177 148 222 146 173 133 13 0 1165 1593

30 rows × 30 columns

Which teams have the highest FG% at every zone?

In [15]:
plt.figure(figsize=(15,12.5),facecolor='white')
ax = nba.plot.court(lw=4,outer_lines=False)
ax.axis('off')
nba.plot.zones(color='white',linewidth=3) 
for zone in zones_list:
    # create series and sort by FG%
    perc_leaders = (100.0*team_by_zone.loc[:,(zone[0],zone[1],1)]/team_by_zone.loc[:,zone].sum(axis=1)).sort_values(ascending=False)
    name = []
    for j in range(3):
        name.append(perc_leaders.index[j]+': %0.1f' %(perc_leaders.ix[j,'FGP']))
    nba.plot.text_in_zone('\n'.join(name),zone,color='black',backgroundcolor = 'white',fontsize=11)
plt.title('Highest Field Goal % by Zone',fontsize=16)
plt.imshow(floor,extent=[-30,30,-7,43])
Out[15]:

References:

  1. Court plotting function was modified from: http://savvastjortjoglou.com/
  2. Accessing the NBA APi - http://www.gregreda.com/2015/02/15/web-scraping-finding-the-api/
  3. Another NBA package for python - https://pypi.python.org/pypi/nbastats/1.0.0
In [16]:
print(pd.__version__)
print(np.__version__)
print (sys.version)
0.19.1
1.12.0
2.7.11 |Anaconda custom (64-bit)| (default, Feb 16 2016, 09:58:36) [MSC v.1500 64 bit (AMD64)]

Comments

comments powered by Disqus